Operating system decoupled heterogeneous computing

ABSTRACT

A heterogeneous processing system is described herein that provides a software hypervisor to autonomously control operating system thread scheduling across big and little cores without the operating system&#39;s awareness or involvement to improve energy efficiency or meet other processing goals. The system presents a finite set of virtualized compute cores to the operating system to which the system schedules threads for execution. Subsequently, the hypervisor intelligently controls the physical assignment and selection of which core(s) execute each thread to manage energy use or other processing requirements. By using a software hypervisor to abstract the underlying big and little computer architecture, the performance and power operating differences between the cores remain opaque to the operating system. The inherent indirection also decouples the release of hardware with new capabilities from the operating system release schedule.

CROSS REFERENCE TO RELATED APPLICATIONS

This application is a continuation of Ser. No. 13/155,387, filed on Jun.8, 2011.

BACKGROUND

Energy efficiency is increasingly becoming an important differentiatorfrom mobile phones to datacenters. Customers are willing to pay apremium for longer lasting mobile device experiences but also areanxious to get increasing performance from these same devices. On theother end of the scale, datacenters continue to scale up compute powerbut face thermal limits for what can be efficiently cooled. In addition,the public is increasingly more conscious of energy usage andenvironmental impact of energy use. Making efficient use of energy istherefore a higher priority design goal in many types of computingsystems.

These technically opposing agendas—delivering more performance but usingless power—have resulted in the industry experimenting withheterogeneous designs of “big” compute cores closely coupled with“little” compute cores within a single system or silicon chip, calledheterogeneous cores or processing herein. The big cores are designed tooffer high performance in a larger power envelope while the little coresare designed to offer lower performance in a smaller power envelope. Theconventional wisdom is that an operating system's scheduler will thenselectively schedule threads on the big or little cores depending uponthe workload(s). During at least some times of the day, the operatingsystem may be able to turn off the big core(s) entirely and rely on thepower sipping little cores.

Big and little cores may or may not share the same instruction set orfeatures. For example, little cores may include a reduced instructionset or other differences that involve further decision making by theoperating system to schedule processes on a compatible core. Onetraditional example is a system that includes a central processing unit(CPU) and graphics-processing unit (GPU) and allows the GPU to be usedfor computing tasks when it is idle or underutilized.

Existing and present solutions depend on modifying the operatingsystem's kernel in order to “enlighten” the operating system to thepresence of big and little cores, their respective performance and powercharacteristics, and which facilities in the system (e.g. CPUperformance counters, cache miss/hit counters, bus activity counters,and so on) the operating system can monitor for determining on whichcore(s) to schedule a particular thread. This approach has severaldrawbacks: 1) it involves modifying the kernel for all supportedoperating systems, 2) it requires the modified kernel to understanddifferences in big/little designs across potentially differentarchitectures (e.g., supporting N different implementations), and 3) ittightly couples the release schedule of the operating system kernel andthe underlying computer architecture. Changes to the computerarchitecture then involve waiting for the next scheduled operatingsystem release (i.e., potentially several years or more) before thekernel can support new cores commercially (or vice versa).

SUMMARY

A heterogeneous processing system is described herein that provides asoftware hypervisor to autonomously control operating system threadscheduling across big and little cores without the operating system'sawareness or involvement to improve energy efficiency or meet otherprocessing goals. The system presents a finite set of virtualizedcompute cores to the operating system to which the system schedulesthreads for execution. Subsequently, underneath the surface, thehypervisor intelligently controls the physical assignment and selectionof which core(s)—big or little—execute each thread to manage energy useor other processing requirements. By using a software hypervisor toabstract the underlying big and little computer architecture, theperformance and power operating differences between the cores remainopaque to the operating system. The inherent indirection also decouplesthe release of hardware with new capabilities from the operating systemrelease schedule. A hardware vendor can release an updated hypervisor,and allow new hardware to work with any operating system version thevendor chooses.

This Summary is provided to introduce a selection of concepts in asimplified form that are further described below in the DetailedDescription. This Summary is not intended to identify key features oressential features of the claimed subject matter, nor is it intended tobe used to limit the scope of the claimed subject matter.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram that illustrates components of theheterogeneous processing system, in one embodiment.

FIG. 2 is a flow diagram that illustrates processing of theheterogeneous processing system to initialize a computing device withheterogeneous processing cores using a hypervisor between the cores andan operating system, in one embodiment.

FIG. 3 is a flow diagram that illustrates processing of theheterogeneous processing system to schedule one or more operating systemthreads through a hypervisor that manages heterogeneous processingcores, in one embodiment.

FIG. 4 is a block diagram that illustrates an operating environment ofthe heterogeneous processing system, in one embodiment.

DETAILED DESCRIPTION

A heterogeneous processing system is described herein that provides asoftware hypervisor to autonomously control operating system threadscheduling across big and little cores without the operating system'sawareness or involvement to improve energy efficiency or meet otherprocessing goals. The system presents a finite set of virtualizedcompute cores to the operating system to which the system schedulesthreads for execution. Subsequently, underneath the surface, thehypervisor intelligently controls the physical assignment and selectionof which core(s)—big or little—execute each thread to manage energy useor other processing requirements. By using a software hypervisor toabstract the underlying big and little computer architecture, theperformance and power operating differences between the cores remainopaque to the operating system. The inherent indirection also decouplesthe release of hardware with new capabilities from the operating systemrelease schedule. A hardware vendor can release an updated hypervisor,and allow new hardware to work with any operating system version thevendor chooses.

The hypervisor implementation is tightly coupled to the underlyingcomputer architecture and uses the available system feedback (e.g., CPUutilization, bus/cache activity, and so forth) to autonomously assignthe appropriate cores for the requested workloads. This approach allowsthe underlying computer architecture to change frequently in cooperationwith the software hypervisor and decouple this evolution from the aboveoperating system(s). The heterogeneous processing system providessimple, course-grained power management without modifying the operatingsystem kernel itself. Thus, the heterogeneous processing system allowsfor more rapid hardware innovation, and allows existing datacenter andother installations to benefit today from available heterogeneousprocessing hardware.

Heterogeneous computing is an emerging field within the industry withthe goal of optimizing the execution of workloads based on differenttypes of computing cores (e.g., CPUs, GPUs, accelerators, and so on)available in the system. Optimization can be for performance, power,latency, or other goals. The heterogeneous processing system, whileapplicable to these more general cases, is also targetable at systemswith cores that have identical functional equivalence but differingperformance/power operating characteristics. Typically, these systemshave one or more big cores and one or more little cores. The big corestypically have deep pipelines, out-of-order execution, large caches,high clock speeds, and are manufactured using higher leakage processes(e.g. 40G). The little cores typically have shorter pipelines, smallercaches, lower clock speeds, various power levels, and are manufacturedusing low leakage processes (e.g. 40LP).

In some embodiments, the big and little cores may have architectureequivalence, micro-architecture equivalence, a global interruptcontroller, coherency, and virtualization. Architecture equivalence mayinclude the same Instruction Set Architecture (ISA), Single InstructionMultiple Data (SIMD), Floating Point (FP), co-processor availability,and ISA extensions. Micro-architecture equivalence may includedifference in performance but the same configurable features (e.g. cacheline length). A global interrupt controller provides the ability tomanage, handle, and forward interrupts to all cores. Coherency means allcores can access (cache) data from other cores with forwarding asneeded. Virtualization is for switching/migrating workloads from/tocores.

In some embodiments, the heterogeneous processing system may be able tohandle minor differences in cores. For example, a little core that doesnot support Streaming Single Instruction, Multiple Data (SIMD)Extensions (SSE) (now existing in four iterations, SSE1, SSE2, SSE3, andSSE4), may still handle other Intel x86-based software code. Thehypervisor may detect unsupported instructions in the instructionstream, and wake up an appropriate core to which to assign such streams.Other instruction streams may operate faithfully on any core. In somecases, such as where only a handful of unsupported instructions areused, the hypervisor may include some level of emulation to emulate theunsupported instructions on the available instruction set. For example,operations such as vector math can often be broken down and implementedat lower efficiency using standard math instructions.

The software hypervisor installs itself during the device boot processprior to operating system (OS) initialization. After completingspecified hardware configuration (i.e., configuring memory, initializingthe virtualization facilities, and so on), the hypervisor thenconfigures the big and little processing cores installed in thecomputing device via policy. For example, if the device is a mobilephone, the policy could dictate that the hypervisor start the operatingsystem with a minimal amount of performance available and optimize forbattery life; the hypervisor would subsequently schedule operatingsystem threads to one or more little cores. Alternatively, if the deviceis a datacenter blade, the policy could dictate that the hypervisorstart the operating system with the maximal amount of availableperformance and sacrifice energy efficiency; the hypervisor wouldsubsequently schedule operating system threads to the available bigcores—as well as possibly the little cores depending on the availablethermal budget. After completing initialization, the software hypervisorloads the operating system boot manager, which then loads the operatingsystem.

During runtime, the heterogeneous processing system presents avirtualized set of cores to the operating system. The operatingcharacteristics and differences between the cores are opaque to theoperating system and managed privately by the software hypervisor basedupon the defined operating policy. The operating policy may be setduring system initialization or dynamically during runtime.

The hypervisor uses the operating policy in conjunction with availablesystem facilities (e.g. CPU performance counters, cache miss/hitcounters, bus activity counters, and so on) to determine to which coresto schedule the operating system threads. The hypervisor will use thisinformation to understand CPU core utilization, trends over time,locality of information, and input/output (I/O) patterns. From thisinformation, the hypervisor can dynamically and speculatively migratethe operating system threads across the big and little cores asappropriate. Additionally, the hypervisor may also control dynamicfrequency and voltage scaling (DFVS) on behalf of the operating systemdepending on the system implementation.

Here is a sampling of available operating policies they hypervisor maycontrol: Minimum Power (MiPo), Maximum Performance (MaPe), MinimalPower, Performance on Demand (MiPoD), and Maximum Performance, PowerDown on Idle (MaPel). Each of these is described in the followingparagraphs. However, additional, more advanced operating policies can beimplemented as chosen by any particular implementation.

Minimum Power (MiPo) schedules threads to the minimal set of cores. Thistypically will mean the hypervisor schedules threads to the little coresand uses DVFS as needed to control the power and performance operatingpoint for the core. Additional little cores can be powered and scheduledas needed.

Maximum Performance (MaPe) schedules threads to the maximal set ofcores. This typically will mean the hypervisor schedules threads to allavailable cores—starting with the big cores—and use DVFS as needed tocontrol the power and performance operating point for the cores. Thelittle cores are also powered and scheduled as much is allowed by theavailable thermal budget.

Minimal Power, Performance on Demand (MiPoD) normally operates at thelowest available power state (e.g., on one or more little cores) butboosts performance as workloads demand. This is commonly referred to asa “turbo” or “boost” mode of operation and is enabled by dynamicallyallocating and scheduling to big cores. Once the workload is completed,the system returns to the minimal power state (e.g. on a little core).

Maximum Performance, Power Down on Idle (MaPel) normally operates at themaximal available performance state (e.g. on one or more big cores) butacquiesces to lower power states once an idle threshold is reached. Theidle threshold in this case is not the typical near-zero CPU utilizationbut can be arbitrarily defined at some Dhrystone Million Instructionsper Second (DMIPS) or CPU utilization percentage as defined by thepolicy. When going to idle, the hypervisor dynamically allocates andschedules to little cores and puts the unused big cores intostandby/parked states. Policy and/or future workloads determine when thesystem returns to the maximum available performance state (e.g. on bigcores).

FIG. 1 is a block diagram that illustrates components of theheterogeneous processing system, in one embodiment. The system 100includes one or more central processing units 110, an operating systeminterface component 120, a virtual core management component 130, apolicy engine component 140, a policy data store 150, a schedulingcomponent 160, a capability management component 170, and a hardwareinterface component 180. Each of these components is described infurther detail herein. The following components may be implementedwithin a software hypervisor that sits between an operating system andhardware resources of a computing device.

The one or more central processing units 110 include one or moreprocessing cores that have heterogeneous processing capabilities andpower profiles. Typically, each CPU complex is located on a singlesilicon die and each core of a CPU complex shares a silicon die.Hardware can be implemented in a variety of packages for a variety oftypes of devices. For example, newer mobile devices and even some recentdesktop processors include a CPU and GPU on the same chip for efficientcommunication between the two and lower power usage. Each CPU complexmay include one or more big and little cores. Alternatively oradditionally, one CPU complex may include all big cores while anotherCPU complex includes all little cores. CPU complexes as used hereapplies to GPUs and other hardware that can execute softwareinstructions.

The operating system interface component 120 communicates between ahypervisor and an operating system to receive instructions fordelivering to hardware resources and for receiving output from thehardware resources. The operating system may schedule threads, provide apointer to an instruction stream (e.g., a program counter (PC)), writeto memory areas that pass instructions to hardware, and so forth. Anoperating system typically interacts directly with the hardware on acomputing device. However, a hypervisor inserts a layer of indirectionbetween the operating system and hardware for a variety of purposes.Often, hypervisors are used to provide virtualization so that multipleoperating systems can be run contemporaneously on the same hardware. Ahypervisor can also be used to present virtual hardware to the operatingsystem that differs from the actual hardware installed in a computingdevice. In the case of the heterogeneous processing system 100, this caninclude making big and little cores appear the same to the operatingsystem. The system 100 may even present a different number of cores tothe operating system than actually exist in the device.

The virtual core management component 130 manages one or more virtualcores that the hypervisor presents to the operating system. A virtualcore appears to the operating system as a CPU core, but may differ incharacteristics from available physical hardware in a computing device.For example, the virtual cores may hide differences in processing orpower capabilities from the operating system, so that an operatingsystem not designed to work with heterogeneous big and little cores canoperate in a manner for which the operating system was designed. In suchcases, the hypervisor provides any specialized programming needed toleverage the heterogeneous computing environment, so that the operatingsystem need not be modified.

The policy engine component 140 manages one or more policies forscheduling operating system threads and presenting virtual cores to theoperating system based on the available one or more central processingunits. The policy engine component 140 may include hardcoded policiesspecific to a particular hypervisor implementation or may includeadministrator-configurable policies that can be modified to suit theparticular installation goals. Policies may determine which cores arescheduled first, tradeoffs between power usage and processing goals, howcores are shut off and awoken to save power, how virtual cores arepresented to the operating system, and so forth.

The policy data store 150 stores the one or more policies in a storagefacility accessible to the hypervisor at boot and execution times. Thepolicy data store 150 may include one or more files, file systems, harddrives, databases, or other storage facilities for persisting dataacross execution sessions of the system 100. In some embodiments, theadministrator performs a setup step that takes the system 100 through aconfiguration phase to store an initial set of policies for use by thehypervisor.

The scheduling component 160 schedules one or more instruction streamsreceived as threads from the operating system to one or more of thecentral processing units installed in the computing device. Thescheduling component receives a virtual core identification from theoperating system that identifies the virtual core to which the operatingsystem requests to schedule the thread. The scheduling component 160examines the schedule request and determines a physical core on which toschedule the thread to execute. For example, the component 160 maydetermine if power or processing is more relevant for the thread, andschedule to an appropriate little or big core in response. In somecases, the component 160 may avoid scheduling threads to certain coresto allow those cores to be powered down to save power.

The capability management component 170 optionally manages one or moredifferences between big and little processing cores. In some cases, thesystem 100 may only operate on processing units in which the big andlittle cores share the same capabilities, and the capability managementcomponent 170 is not needed. In other cases, the system 100 handlesminor or major differences between available processing cores. Forexample, the system 100 may watch for instructions that are notsupported by some cores and schedule the corresponding threads on coresthat do support those instructions. In more sophisticatedimplementations, the component 170 may virtualize or emulate big corecapabilities on little cores (or vice versa) to satisfy a power or otherprofile goal.

The hardware interface component 180 communicates between the hypervisorand central processing units to schedule software instructions to run onavailable physical cores. The hardware interface component 180 mayinclude real memory addresses or other facilities for accessing realhardware that are hidden from other components and in particular fromthe guest operating system(s) managed by the hypervisor.

The computing device on which the heterogeneous processing system isimplemented may include a central processing unit, memory, input devices(e.g., keyboard and pointing devices), output devices (e.g., displaydevices), and storage devices (e.g., disk drives or other non-volatilestorage media). The memory and storage devices are computer-readablestorage media that may be encoded with computer-executable instructions(e.g., software) that implement or enable the system. In addition, thedata structures and message structures may be stored or transmitted viaa data transmission medium, such as a signal on a communication link.Various communication links may be used, such as the Internet, a localarea network, a wide area network, a point-to-point dial-up connection,a cell phone network, and so on.

Embodiments of the system may be implemented in various operatingenvironments that include personal computers, server computers, handheldor laptop devices, multiprocessor systems, microprocessor-based systems,programmable consumer electronics, digital cameras, network PCs,minicomputers, mainframe computers, distributed computing environmentsthat include any of the above systems or devices, set top boxes, systemson a chip (SOCs), and so on. The computer systems may be cell phones,personal digital assistants, smart phones, personal computers,programmable consumer electronics, digital cameras, and so on.

The system may be described in the general context ofcomputer-executable instructions, such as program modules, executed byone or more computers or other devices. Generally, program modulesinclude routines, programs, objects, components, data structures, and soon that perform particular tasks or implement particular abstract datatypes. Typically, the functionality of the program modules may becombined or distributed as desired in various embodiments.

FIG. 2 is a flow diagram that illustrates processing of theheterogeneous processing system to initialize a computing device withheterogeneous processing cores using a hypervisor between the cores andan operating system, in one embodiment.

Beginning in block 210, the system receives a startup request toinitialize a computing device. For example, a basic input/output system(BIOS), extensible firmware interface (EFI), boot loader, or otherinitial device software may load and invoke a hypervisor that implementsthe heterogeneous computing system. In some cases, the administratorwill have previously performed an installation phase to install thehypervisor on the computing device, although the system can also supportnetwork boot and other non-installation scenarios commonly offered forcomputing devices.

Continuing in block 220, the system enumerates two or more physicalprocessing cores of the computing device. In some embodiments, at leasttwo cores offer different performance and power usage characteristics.However, the system may also be used where asymmetry is not present. Forexample, using a software hypervisor for power management could still beapplicable in scenarios where you have N physical CPUs on die but thatonly K can be operated based upon externalities such as: ambienttemperature, form factor enclosure, cost of available power, etc. Atboot, the hypervisor can use this “policy” information to report avirtualized set of K cores to the operating system and this could varyupon each boot cycle. The hypervisor would be performing the same taskin this scenario for symmetric cores. The system may invoke the BIOS orother underlying layer to determine how many and what kind of processorsthe computing device has installed, and may execute a CPUID or othersimilar instruction to determine information about the processingcapabilities of the processors. In some embodiments, the system mayinclude an extensibility interface through which drivers or otherhypervisor extensions can be implemented and added by the hypervisormanufacturer or a third party to add support for new processing hardwareto the hypervisor, without necessarily updating the hypervisor itself.

Continuing in block 230, the system determines capabilities of eachenumerated processing core. The capabilities may include one or morepower profiles offered by each core, one or more instruction setssupported by each core, performance characteristics of each core, and soforth. The system may leverage informational interfaces (such as thepreviously mentioned CPUID instruction) of the core itself orinformation provided by a driver or other extension to the hypervisor,to determine each core's capabilities. The system uses the determinedcapabilities to assign threads to each core that are compatible with thecore, and to perform scheduling in a manner consistent with receivedpolicies and processing goals.

Continuing in block 240, the system identifies one or more operatingsystems for which the hypervisor will manage access and scheduling forthe enumerated physical cores. The system may access a hard drive, flashdrive, or other storage of the computing device to determine whichoperating system to invoke after the hypervisor is initialized. Thehypervisor may be designed with information about various operatingsystems, and may include extensibility so that new operating systems canbe supported without updating the hypervisor itself. Each operatingsystem and operating system version may have different schedulingsemantics or other nuances that the hypervisor handles to allow theoperating system to execute correctly on virtualized processingresources. In some cases, the hypervisor may be requested to allowmultiple operating systems to share the enumerated physical processingcores, and policy may dictate how that sharing is handled.

Continuing in block 250, the system accesses hypervisor policyinformation that specifies one or more goals for scheduling operatingsystem threads on the enumerated physical processing cores. The goalsmay include performance goals, power usage goals, or other directionsfor determining which core or cores on which to execute operating systemthreads. The policy may be stored in a storage device associated withthe computing device, hardcoded into a hypervisor implementation, and soforth. The hypervisor may receive updates to the policy through anadministrative interface provided to administrators.

Continuing in block 260, the system creates one or more virtual cores toexpose to the identified operating system, wherein each virtual coreisolates the operating system from determined differences incapabilities among the physical processing cores. For example, theheterogeneous processing system may present two or more big and littlecores as a single type of uniform virtual core to the operating system.Upon receiving a scheduling request from the operating system to executea thread on a virtual core, the system determines which physical core toselect for the job based on the accessed hypervisor policy. Thehypervisor policy may specify that the hypervisor present a differentnumber of virtual cores than physical cores, such as when it is a goalto be able to seamlessly power down at least some higher power demandingcores in favor of using lower power demanding cores. Alternatively, thesystem may still power down cores the operating system is aware of butwake up the cores if the operating system chooses to use them or to usea quantity of cores that cannot be satisfied by lower powered coresalone.

Continuing in block 270, the system invokes the identified operatingsystem and presents the created virtual cores to the operating systemwhile isolating the identified operating system from the enumeratedphysical processing cores. Invoking the operating system may includeinvoking an operating system loader and presenting the hypervisor inplace of the usual BIOS or other layer underlying the operating system.The operating system operates as if it is running directly on thephysical hardware, but the hypervisor sits between the operating systemand physical hardware to perform the scheduling logic without theoperating system's knowledge as described herein. After block 270, thesesteps conclude.

FIG. 3 is a flow diagram that illustrates processing of theheterogeneous processing system to schedule one or more operating systemthreads through a hypervisor that manages heterogeneous processingcores, in one embodiment.

Beginning in block 310, the system receives a thread scheduling requestfrom an operating system to run instructions of a thread on anidentified virtual core presented by a hypervisor, wherein the virtualcore isolates the operating system from one or more capabilitydifferences between two or more physical processing cores accessible toa computing device. An operating system typically includes an idle loopfor each detected processing core in which the operating system canschedule and place any instructions that the operating system wants torun on that core. The operating system may time slice multipleapplication threads to run on a particular processing core. Regardlessof the particular virtual core that the operating system selects forexecuting a thread, the hypervisor may select any particular physicalprocessing core to execute the thread in accordance with one or morehypervisor policies.

Continuing in block 320, the system determines the processing needs ofthe received scheduling request. For example, the system may determine aparticular instruction set used by the scheduled thread (e.g., whetherone or more instruction set extensions, coprocessors, or othercapabilities are being requested), performance requirements of thethread, whether the thread is suitable for slower execution at lowerpower usage, whether the thread can be delayed until additionalprocessing resources are available, and so forth. The system may usespecific knowledge about particular operating systems or instructionsreceived through policy to determine processing needs of a particularthread. For example, the system may identify threads related to theoperating system's internal operation, application threads, and so on,and handle each according to policy.

Continuing in block 330, the system accesses a scheduling policy thatspecifies one or more goals for operating the device. For example, thepolicy may request optimization of power usage, performance, or a mix ofthe two. The policy may be stored in a data store associated with thedevice or hardcoded into particular implementations of the hypervisor.For example, the system may offer a low power usage version of thehypervisor that favors little processing cores until a thread performinga high performance task is scheduled by the operating system. At thatpoint, the system may schedule the high performance task on a big corethen direct the big core to sleep after the task is complete.

Continuing in block 340, the system selects a physical processing coreon which to execute the thread associated with the received schedulingrequest, wherein the selection is made based on the accessed schedulingpolicy. The system may have multiple available cores of differingcapabilities and performance/power characteristics on which the systemcan schedule the thread. Based on the system's choice of core, thecomputing device will use more or less power and will complete thethread's execution in more or less time. The job or the schedulingpolicy is to allow the system to make the selection in a manner thatpromotes one or more goals for managing performance, power, or othercharacteristics of the computing device. A mobile device may preferlower power usage, while a high performance server may prefer higherperformance. In some cases, policy may differ based on time of day(e.g., peak versus non-peak electricity costs) or other considerationsso that the policy varies over time or based on certain conditions.

Continuing in block 350, the system optionally handles any capabilitydifferences between the thread and the selected physical processingcore. For example, if the thread includes an instruction that is notavailable on the selected core, the system may emulate the instructionor replace the instruction with one or more equivalent instructions thatare supported by the selected core. Managing capability differences addsa significant amount of complexity to the system, and the hypervisorimplementer may choose how much or little (if any) differences incapabilities between processing cores any particular implementation willsupport.

Continuing in block 360, the system schedules the thread to execute onthe selected physical processing core. The system also handles anyoutput and provides the output back to the operating system, making theoutput appear to come from the virtual core to which the operatingsystem assigned the thread. Thus, the operating system is kept unawareof the type and number of cores managed by the hypervisor and uses theset of virtual cores as the operating system normally would in a systemwithout the hypervisor and heterogeneous processing cores. After block360, these steps conclude.

FIG. 4 is a block diagram that illustrates an operating environment ofthe heterogeneous processing system, in one embodiment. A guestoperating system 410 sees one or more virtual processing cores 420presented by the software hypervisor 430 that implements theheterogeneous processing system described herein. The softwarehypervisor 430 manages heterogeneous processing hardware 440 thatincludes a global interrupt controller 450, one or more big cores 460,and one or more little cores 470. Note that although only two core typesare shown (big and little) for ease of illustration, the system canoperate with any number of different cores. For example, some processorpackages may include several processing stores that gradually step downin power usage and performance. The software hypervisor 430 manages theheterogeneous processing hardware 440 to isolate the guest operatingsystem 410 from any special knowledge or processing needed to use thehardware 440 effectively. Thus, the software hypervisor 430 allows anunmodified legacy operating system, such as guest operating system 410,to leverage newer heterogeneous processing hardware 440. The hypervisor430 can be modified to keep up with hardware 440 changes, while theoperating system 410 continues to operate as normal (although withbetter and better power/performance characteristics).

In some embodiments, the heterogeneous processing system migrates athread from one physical processing core to another after the thread isalready executed. In some cases, one or more threads may already beexecuting upon the hypervisor deciding to reduce power consumption,increase performance, or carry out other policy goals. The cores mayshare cache storage or other facilities, so that the hypervisor canmigrate the thread to another core without affecting the thread's accessto data. Thus, the hypervisor may interrupt the thread's execution, movethe thread's instruction stream to a different physical processing core,and resume execution on the target core.

In some embodiments, the heterogeneous processing system employsprocessor voltage and frequency modifications to reduce power orincrease performance up to a threshold before selecting a differentcore. For example, the system may start out executing a particularthread on a big core, then scale back the big core's power usage byreducing the core's operating voltage, and finally may migrate the bigcore's work to a little core. This allows the system to step down powerusage gradually to manage a thermal envelope or satisfy other computinggoals specified by policy.

In some embodiments, the heterogeneous processing system allows someprocessing tasks to be migrated to a cloud computing facility. Thesystem can present the cloud computing facility as just anotherprocessing core to which tasks can be scheduled. For appropriate tasks,the system may be able to offload the tasks from the computing deviceentirely and later return the output of the task to the guest operatingsystem. This may allow the system to enter a lower power state on thecomputing device or to transition work from a datacenter at peakelectricity cost to one of lower electricity cost.

In some embodiments, the heterogeneous processing system handles racesconditions and employs software-locking paradigms to manage operatingsystem expectations. In many cases, operating system's schedule threadsbased on interdependencies or lack of dependencies between particularthreads. Software may leverage locks, mutexes, semaphores, or othersynchronization primitives provided by the operating system to allow thesoftware code to operate correctly in an environment of multiplesimultaneously executing threads. The heterogeneous computing systemensures that the operating system's guarantees about thread safety andother synchronization are met, and may introduce additional locks ordetermine thread scheduling in a manner that ensures no new raceconditions or other problems are introduced.

In some embodiments, the heterogeneous processing system includes ahardware hypervisor. Although a software hypervisor has been used inexamples herein, those of ordinary skill in the art will recognize thatthe choice of hardware or software for implementation of computing tasksis often an implementation detail that can be switched to meetperformance or other goals. Thus, the system can be implemented with ahardware hypervisor and some processing units may be manufactured toinclude the system in the processing unit itself.

From the foregoing, it will be appreciated that specific embodiments ofthe heterogeneous processing system have been described herein forpurposes of illustration, but that various modifications may be madewithout deviating from the spirit and scope of the invention.Accordingly, the invention is not limited except as by the appendedclaims.

I/We claim:
 1. A computer-implemented method for providing operatingsystem-decoupled heterogeneous computing through a hypervisor, themethod comprising: determining capabilities of two or more physicalprocessing cores accessible to a computing device, wherein thecapabilities include one or more power profiles offered by each physicalprocessing core and performance characteristics of each physicalprocessing core; accessing hypervisor policy information that specifiesone or more goals for scheduling operating system threads on thephysical processing cores; creating one or more virtual cores to exposeto an operating system, wherein each virtual core isolates the operatingsystem from determined differences in capabilities among the physicalprocessing cores; and invoking the operating system and presenting theone or more virtual cores to the operating system while isolating theoperating system from the physical processing cores.
 2. Thecomputer-implemented method of claim 1, further comprising: activatingthe hypervisor, the hypervisor interfacing between the operating systemand the physical processing cores; and identifying at least oneoperating system that the hypervisor will manage access and schedulingof the physical processing cores.
 3. The computer-implemented method ofclaim 1, wherein the one or more goals include one or more of aperformance goal and/or a power usage goal.
 4. The computer-implementedmethod of claim 1, wherein the hypervisor policy information includesone or more of Minimum Power, Maximum Performance, Minimal PowerPerformance on Demand, and/or Maximum Performance Power Down on Idle. 5.The computer-implemented method of claim 1, further comprising:receiving a thread scheduling request from the operating system to run athread on an identified virtual core; and selecting at least onephysical processing core on which to execute the thread.
 6. Thecomputer-implemented method of claim 5, wherein the selection is basedon a scheduling policy and available system facilities.
 7. Thecomputer-implemented method of claim 6, wherein the scheduling policy isbased on power usage.
 8. A computer-readable storage medium, notcomprising a signal per se, including instructions, upon execution,cause a processor to perform actions comprising: determiningcapabilities of two or more physical processing cores accessible to acomputing device, wherein the capabilities include one or more powerprofiles offered by each physical processing core and performancecharacteristics of each physical processing core; accessing hypervisorpolicy information that specifies one or more goals for schedulingoperating system threads on the physical processing cores; creating oneor more virtual cores to expose to an operating system, wherein eachvirtual core isolates the operating system from determined differencesin capabilities among the physical processing cores; and invoking theoperating system and presenting the one or more virtual cores to theoperating system while isolating the operating system from the physicalprocessing cores.
 9. The computer-readable storage medium of claim 8,further comprising: activating the hypervisor, the hypervisorinterfacing between the operating system and the physical processingcores; and identifying at least one operating system that the hypervisorwill manage access and scheduling of the physical processing cores. 10.The computer-readable storage medium of claim 8, wherein the one or moregoals include one or more of a performance goal and/or a power usagegoal.
 11. The computer-readable storage medium of claim 8, wherein thehypervisor policy information includes one or more of Minimum Power,Maximum Performance, Minimal Power Performance on Demand, and/or MaximumPerformance Power Down on Idle.
 12. The computer-readable storage mediumof claim 8, further comprising: receiving a thread scheduling requestfrom the operating system to run a thread on an identified virtual core;and selecting at least one physical processing core on which to executethe thread.
 13. The computer-readable storage medium of claim 8, whereinthe selection is based on a scheduling policy and available systemfacilities.
 14. The computer-readable storage medium of claim 8, whereinthe scheduling policy is based on power usage.
 15. A computer system forproviding operating system-decoupled heterogeneous computing, the systemcomprising: two or more physical processing cores; a memory, the memoryincluding: hypervisor policy information that specifies one or moregoals for scheduling operating system threads, wherein at least one goalis based on power usage; and a hypervisor, interfacing between anoperating system and the two or more physical processing cores, thehypervisor: determining capabilities of the two or more physicalprocessing cores, wherein the capabilities include one or more powerprofiles offered by each physical processing core and performancecharacteristics of each physical processing core, creating one or morevirtual cores to expose to the operating system based on the hypervisorpolicy information; and invoking the operating system and presenting theone or more virtual cores to the operating system while isolating theoperating system from the physical processing cores.
 16. The computersystem of claim 15, wherein the hypervisor policy information includesone or more of Minimum Power, Maximum Performance, Minimal Power,Performance on Demand, and/or Power Down on Idle.
 17. The computersystem of claim 15, further comprising: receiving a thread schedulingrequest from the operating system to run a thread on an identifiedvirtual core; and selecting at least one physical processing core onwhich to execute the thread.
 18. The computer system of claim 17,wherein the selection is based on a scheduling policy and availablesystem facilities.
 19. The computer system of claim 17, wherein thehypervisor migrates the thread to another physical processing core afterthe thread has begun executing.
 20. The computer system of claim 17,wherein the hypervisor employs processor voltage and frequencymodifications to reduce power or increase performance up to a thresholdbefore selecting a core.